Towards autoscaling of Apache Flink jobs
نویسندگان
چکیده
Abstract Data stream processing has been gaining attention in the past decade. Apache Flink is an open-source distributed engine that able to process a large amount of data real time with low latency. Computations are among cluster nodes. Currently, provisioning appropriate cloud resources must be done manually ahead time. A dynamically varying workload may exceed capacity cluster, or leave underutilized. In our paper, we describe architecture enables automatic scaling jobs on Kubernetes based custom metrics, and simple policy. We also measure e ects state size target parallelism duration operation, which considered when designing autoscaling policy, so job respects Service Level Agreement.
منابع مشابه
Approximate Stream Analytics in Apache Flink and Apache Spark Streaming
Approximate computing aims for efficient execution of workflows where an approximate output is sufficient instead of the exact output. The idea behind approximate computing is to compute over a representative sample instead of the entire input dataset. Thus, approximate computing — based on the chosen sample size — can make a systematic trade-off between the output accuracy and computation effi...
متن کاملDevelopment of a News Recommender System based on Apache Flink
The amount of data on the web is constantly growing. The separation of relevant from less important information is a challenging task. Due to the huge amount of data available in the World Wide Web, the processing cannot be done manually. Software components are needed that learn the user preferences and support users in finding the relevant information. In this work we present our recommender ...
متن کاملA Study of Execution Strategies for openCypher on Apache Flink
The concept of big data has become popular in recent years due to the growing demand of handling datasets of large sizes. A lot of new frameworks have been proposed to deal with the problem of processing, analysis and storage of big data. As one of them, Apache Flink is an open source platform allowing for distributed stream and batch data processing. Cypher, a declarative query language develo...
متن کاملOn the usability of Hadoop MapReduce, Apache Spark & Apache flink for data science
Distributed data processing platforms for cloud computing are important tools for large-scale data analytics. Apache Hadoop MapReduce has become the de facto standard in this space, though its programming interface is relatively low-level, requiring many implementation steps even for simple analysis tasks. This has led to the development of advanced dataflow oriented platforms, most prominently...
متن کاملState Management in Apache Flink®: Consistent Stateful Distributed Stream Processing
Stream processors are emerging in industry as an apparatus that drives analytical but also mission critical services handling the core of persistent application logic. Thus, apart from scalability and low-latency, a rising system need is first-class support for application state together with strong consistency guarantees, and adaptivity to cluster reconfigurations, software patches and partial...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Acta Universitatis Sapientiae: Informatica
سال: 2021
ISSN: ['1844-6086', '2066-7760']
DOI: https://doi.org/10.2478/ausi-2021-0003